Goto

Collaborating Authors

 optimal sparse decision tree


Reviews: Optimal Sparse Decision Trees

Neural Information Processing Systems

Originality: Training of optimal decision trees is clearly a problem that has seen a lot of prior work. A distinguishing feature of this submission is that it focuses on optimal *sparse* decision trees for binary variables, and that the approach seems to be feasible in practice, which is achieved by a combination of analytical bounds that reduce the search space as well as efficient implementation techniques. The work builds upon the CORLES algorithm and its approach to creating optimal decision lists. However, the authors extend this approach to decision trees in a non-trivial manner that adds substantial novelty. Quality: The claims of the paper are very well supported by theoretical analysis as well as experiments.


Reviews: Optimal Sparse Decision Trees

Neural Information Processing Systems

Reviewers are very positive about the paper. The contribution is clear and significant. The paper should clearly be accepted. The authors should take into account all reviewers' comments when preparing the final version of their paper, as promised in their response, in particular the improvements suggested by reviewer 1 (as I agree that the paper is heavy on notation and not totally self-contained).


Optimal Sparse Decision Trees

Neural Information Processing Systems

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library.


Optimal Sparse Decision Trees

Hu, Xiyang, Rudin, Cynthia, Seltzer, Margo

Neural Information Processing Systems

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library.


Optimal Sparse Decision Trees

Hu, Xiyang, Rudin, Cynthia, Seltzer, Margo

arXiv.org Machine Learning

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.